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General Deviants: An Analysis of Perturbations 

in Compressed Sensing 

Matthew A. Herman and Thomas Strohmer 



Abstract 

We analyze the Basis Pursuit recovery of signals with general perturbations. Previous studies have only con- 
sidered partially perturbed observations Ax + e. Here, a; is a signal which we wish to recover, A is a full-rank 
matrix with more columns than rows, and e is simple additive noise. Our model also incorporates perturbations E 
to the matrix A which result in multiplicative noise. This completely perturbed framework extends the prior work of 
Candes, Romberg and Tao on stable signal recovery from incomplete and inaccurate measurements. Our results show 
£N1 ■ that, under suitable conditions, the stability of the recovered signal is limited by the noise level in the observation. 

Moreover, this accuracy is within a constant multiple of the best-case reconstruction using the technique of least 
squares. In the absence of additive noise numerical simulations essentially confirm that this error is a linear function 
of the relative perturbation. 



> 
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I. Introduction 



Employing the techniques of compressed sensing (CS) to recover signals with a sparse representation has enjoyed 
^ ; a great deal of attention over the last 5-10 years. The initial studies considered an ideal unperturbed scenario: 



Ax. (1) 



Here b G C m is the observation vector, A G £, mxn [ s a full-rank measurement matrix or system model (with 
■ m < n), and x G C™ is the signal of interest which has a sparse, or almost sparse, representation under some fixed 
. basis. More recently researchers have included an additive noise term e into the received signal [l]-[4] creating a 
^ | partially perturbed model: 

b = Ax + e (2) 

i> ; 

^ This type of noise typically models simple errors which are uncorrected with x. 

As far as we can tell, practically no research has been done yet on perturbations E to the matrix A0H Our 
completely perturbed model extends ((U) by incorporating a perturbed sensing matrix in the form of 

A = A + E. 



X 

& ' It is important to consider this kind of noise since it can account for precision errors when applications call for 
physically implementing the measurement matrix A in a sensor. In other CS scenarios, such as when A represents 
a system model, E can absorb errors in assumptions made about the transmission channel. This can be realized 
in radar [7], remote sensing [8], telecommunications, source separation [5], [6], and countless other problems. 
Further, E can also model the distortions that result when discretizing the domain of analog signals and systems; 
examples include jitter error and choosing too coarse of a sampling period. 

The authors are with the Department of Mathematics, University of California, Davis, CA 95616-8633, USA (e-mail: {mattyh, 
strohmer}@math . ucdavis . edu). 
This work was partially supported by NSF Grant No. DMS-0811169 and NSF VIGRE Grant No. DMS-0636297. 

'A related problem is considered in [5] for greedy algorithms rather than i'l-minimization, and in a multichannel rather than a single 
channel setting; it mentions using different matrices on the encoding and decoding sides, but its analysis is not from an error or perturbation 
point of view. 

2 At the time of revising this manuscript we became aware of an earlier study [6] which discusses the error resulting from estimating the 
mixing matrix in source separation problems. However, it only covers strictly sparse signals, and its analysis is not as in depth as presented 
in this manuscript. 
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In general, these perturbations can be characterized as multiplicative noise, and are more difficult to analyze than 
simple additive noise since they are correlated with the signal of interest. To see this, simply substitute A = A — E 
in there will be an extra noise term Ex. 

The rest of this section establishes certain assumptions and notation necessary for our analysis. Section [TT] first 
gives a brief review of previous work on the partially perturbed scenario in CS, and then presents our main 
theoretical and numerical results on the completely perturbed scenario. Section [III] provides proofs of the theorems, 
and Section JV] compares the CS solution with classical least squares. Concluding remarks are given in Section IVl 
and a brief discussion on different kinds of perturbation E which we often encounter can be found in the Appendix. 



A. Assumptions and Notation 

Throughout this paper we represent vectors and matrices with boldface type. Without loss of generality, assume 
that the original data a; is a if-sparse vector for some fixed K, or that it is compressible. Vectors which are K- 
sparse contain no more than K nonzero elements, and compressible vectors are ones whose ordered coefficients 
decay according to a power law (i.e., |a:|( fc ) < C p k~ p , where \x\^ is the fcth largest element of x, p > 1, and C p is 
a constant which depends only on p). Let vector xk £ C n be the best If -term approximation to x, i.e., it contains 
the K largest coefficients of x with the rest set to zero. We occasionally refer to this vector as the "head" of x. 
Note that if x is if -sparse, then x = xk- With a slight abuse of notation denote xk° = x — xk as the "tail" of x. 

The symbols cr max (Y"), a m - m (Y), and \\Y |ta respectively denote the usual maximum, minimum nonzero singular 
values, and spectral norm of a matrix Y. Our analysis will require examination of submatrices consisting of an 
arbitrary collection of K columns. We use the superscript (K) to represent extremal values of the above spectral 
measures. For instance, o"m^(Y") denotes the largest singular value taken over all K-column submatrices of Y. 
Similar definitions apply to ||V||2 anc ^ rank^(Y"), while <7^(Y") is the smallest nonzero singular value over 
all .ff -column submatrices of Y. With these, the perturbations E and e can be quantified with the following relative 
bounds 

\\E\\o IIJSII, (k) Helta 



\A 



\{K) 
\2 



where ||A|ta, ||A|| 2 ^, ||b|ta ^ 0. In real-world applications we often do not know the exact nature of E and e and 
instead are forced to estimate their relative upper bounds. This is the point of view taken throughout most of this 

(K) 

treatise. In this study we are only interested in the case where ea,^a , £fo < 1- 



II. CS t\ Perturbation Analysis 

A. Previous Work 

In the partially perturbed scenario (i.e., E = 0) we are concerned with solving the Basis Pursuit (BP) 
problem [9]: 

z* = argmin ||£||i s.t. || Az — 6|ta < e' (4) 

for some s' > 0@ 

The restricted isometry property (RIP) [10] for any matrix A G £, mxn defines, for each integer K = 1, 2, . . ., 
the restricted isometry constant (RIC) 8k, which is the smallest nonnegative number such that 

{\-8 K )\\x\\l < \\Axf 2 < (1 + ^)11x111 (5) 

holds for any K-sparse vector x. In the context of the RIC, we observe that ||A||2 = °max(A) < y/l + 5k, and 
a^l(A) > VT=5k~. 

3 lt essentially makes no difference whether we account for the perturbation E on the "encoding side" lO, or on the "decoding side" {7}. 
The model used here was chosen so as to agree with the conventions of classical perturbation theory which we use in Section ITVl 
4 Throughout this paper absolute errors are denoted with a prime. In contrast, relative perturbations, such as in ifj}, are not primed. 
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Assuming 5 2 k < V% — 1 and ||e|| 2 < e', Candes has shown ([1], Thm. 1.2) that the solution to I® obeys 

||«* - z|| 2 < C K~ 1/2 \\x - x K \\i + Cie' (6) 
for some constants Cq,C\ > which are reasonably well-behaved and can be calculated explicitly. 



B. Incorporating nontrivial perturbation E 

Now assume the completely perturbed situation with E, e / 0. In this case the BP problem of (®) can be 
generalized to include a different decoding matrix A: 

z* = argmin ||*||i s.t. \\Az — 2 < ^AKb 0) 

z 

for some e' A K b > 0. The following two theorems summarize our results. 

Theorem 1 (RIP for A). Fix K = 1,2,.... Given the RIC 5 k associated with matrix A in (0) and the relative 
perturbation associated with (possibly unknown) matrix E in (0, fix the constant 

<k max := (1 + 5 K ) (l + - 1. (8) 

Then the RIC 5k far matrix A = A + E is the smallest nonnegative number such that 

(1 - 5 K )\\x\\ 2 2 < \\Axf 2 < (1 + 5 K )\\x\\ 2 2 (9) 
holds for any K -sparse vector x where 5k — 5k max* 

Remark 1. Properly interpreting Theorem [2] is important. It is assumed that the only information known about 
matrix E is its worst-case relative perturbation £jp, and therefore the bound of 5k max i n ® represents a worst- 

(K) ' 

case deviation of 5k- Notice for a given e A that there are infinitely many E which satisfy it. In fact, it is possible 
to construct nonzero perturbations which result in 5k = 5k ] - For example, suppose A = AU for some unitary 
matrix U / I where i" is the identity matrix. Clearly here E = A(U — I) 7^ and yet since U is unitary we have 
5k = 5k- In this case using to calculate (^max could be a gross upper bound for 5k- If more information 
on E is knownjf] then much tighter bounds on 5k can be determined. 

Remark 2. The flavor of the RIP is defined with respect to the square of the operator norm. That is, (1 — 5k) and 
(1 + 5k) are measures of the square of the minimum and maximum singular values of /^-column submatrices 

(K) 

of A, and similarly for A. In keeping with the convention of classical perturbation theory however, we defined e A 
in ([3]) just in terms of the operator norm (not its square). Therefore, the quadratic dependence of 5K,m&x on 
in (H]) makes sense. Moreover, in discussing the spectrum of A'-column submatrices of A, we see that it is really 
a linear function of . 

Before introducing the next theorem let us define the following constants due to matrix A 

(k) VTTJk \\a\\ 2 

Ka := 7r^' aA:= 7T^- (10) 

The first quantity bounds the ratio of the extremal singular values of all K-column submatrices of A 

Q-mXjA) (K) 

aL K 2(A) ~ KA ■ 



mm 



Actually, for very small 5k we have w 1, which implies that every i'T-column submatrix forms an approxi- 
mately orthonormal set. 
Also introduce the ratios 

\\ X K C 2 \\XK' 1 

TK-=-r, tt i s K ■■= -r, 7T- (11) 

\\ X K\\2 \\Xk 2 



5 See the appendix for more discussion on the different forms of perturbation E which we are likely to encounter. 
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which quantify the weight of a signal's tail relative to its head. When x is K-sparse we have xk^ = 0, and so 
fK = SK = 0. If x is compressible, then these values are a function of the power p (i.e., the rate at which the 
coefficients decay), and the cardinality K of the group of its largest entries. For reasonable values of p and K, we 
expect that tk,sk <C 1. 

Theorem 2 (Stability from completely perturbed observation). Fix the relative perturbations sa< e jf > £ A an d £ b 
in (TJl). Assume the RIC for matrix A satisfied 

<w < 



V2 



and that general signal x satisfies 



Set the total noise parameter 



S K i 



<K 



A 



£ A,K,b 



(K) (K) . 
e A K A + e AOtATK 



.l-K^\r K + s K /VK) 
Then the solution of the BP problem ([71) obeys 

Co 



+ e b )\\b 



2- 



x 2 < 



\x - xjch + C\e' 



A,K,bi 



where 



Co 



Ci 



(l + (V2-l) 


(1 + M (l + e 


(2K) 
A 


) 2 -] 


1 - (V2 + 1) 

Vi 


(1 + M + 

+^ (i+ e r } ) 


2 

- 1 


1 - (V2 + 1) 


(1 + s 2K ) 




2 

- 1 



(12) 



(13) 



(14) 



(15) 



(16) 



(17) 



Remark 3. Theorem [2] generalizes Candes' results in [1]. Indeed, if matrix A is unperturbed, then E = and 
0. It follows that 8k = $K in ©, and the RIPs for A and A coincide. Moreover, assumption (fT2l 



in Theorem |2] reduces to 5^ < V% — 1> and the total perturbation (see (1231 ) collapses to ||e||2 < £5 := etll&lb (so 
that assumption ( fTBl is no longer necessary); both of these are identical to Candes' assumptions in ©. Finally, the 
constants Co, C\ in (fT6l ) and (fTTT ) reduce to the same as outlined in the proof of [1]. 

The assumption in ( fTBl demands more discussion. Observe that the left-hand side (LHS) is solely a function 
of the signal x, while the right-hand side (RHS) is just a function of the matrix A. For reasonably compressible 
signals, it is often the case that the LHS is on the order of 10 -2 or 10~ 3 . At the same time, the RHS is always of 
order 10° due to assumption (1T21 . Therefore, there should be a sufficient gap to ensure that assumption (fT3l holds. 
Clearly this condition is automatically satisfied whenever x is strictly i^-sparse. 

In fact, more can be said about Theorem [2] for the case of a if-sparse input. Notice then that the terms related 
to xk<= in (fl4l and (031 ) disappear, and the accuracy of the solution becomes 



x 



< 



Ci(> 



A t A 



2- 



This form of the stability of the BP solution is helpful since it highlights the effect of the perturbation E on the K 
most important elements of x, as well as the influence of the additive noise e. Clearly in the absence of any 
perturbation, a K-sparse signal can be perfectly recovered by BP. 

It is also interesting to examine the spectral effects due to the first assumption of Theorem [2] Namely, we want 
to be assured that the maximum rank of submatrices of A is unaltered by the perturbation E. 

6 Note for S 2K > 0, Q2) requires that e^ A '' < y/2 - 1. 
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Lemma 1. Assume condition di2D of Theorem [2] holds. Then for any k < 2K 

aH(E) < a^ Q (A), (18) 

and therefore 

ran k^(A) = rank^(A). 
We apply this fact in the least squares analysis of Section HVl 

The utility of Theorems Q] and [2] can be understood with two simple numerical examples. Suppose that matrix A 
in © represents a system that a signal passes through which in reality has an RIC of &2K = 0.100. Assume 
however, that when modeling this system we introduce a worst-case relative error of £^ = 5% so that we think 
that the system behaves as A = A + E. From ([8]) we can verify that matrix A has an RIC ^i^max = 0.213 which 
satisfies (fl~2l) . Thus, if (fT"3T ) is also satisfied, then Theorem [2] guarantees that the BP solution will have accuracy 
given in ( fT51 ) with Co = 4.47 and C\ = 9.06. Note from ( fT6l ) and ( fT71 ) we see that if there had been no perturbation, 
then C = 2.75 and C\ = 5.53. 

Consider now a different example. Suppose instead that 82K = 0.200 with = 1%. Then ^^max = 0.224, 

Co = 4.76 and C\ = 9.64. Here, if A was unperturbed, then we would have had Co = 4.19 and C\ = 8.47. 

These numerical examples show how the stability constants Co and C\ of the BP solution get worse with 
perturbations to A. It must be stressed however, that they represent worst-case instances. It is well-known in the 
CS community that better performance is normally achieved in practice. 



C. Numerical Simulations 

Numerical simulations were conducted in Matlab as follows. In each trial a new matrix A of size 128 x 512 
was randomly generated with normally distributed entries AA(0, a 2 ) where a 2 = 1/128 (so that the expected £2- 
norm of each column was unity), and the spectral norm of A was calculated. Next, for each relative perturbation 
ea = 0,0.01,0.05,0.1 a different perturbation matrix E with normally distributed entries was generated, and then 
scaled so that \\E\\2 = £a • \\AW2\j A random vector x of sparsity K = 1, ... ,64 was then randomly generated 
with nonzero entries uniformly distributed A/"(0, 1), and b = Ax in (fSJ) was created (note, we set e = so as 
to focus on the effect of perturbation E). Finally, given b and the A = A + E associated with each ea, the BP 
program ^} was implemented with cvx software [11] and the relative error \\z* — x||2/||x||2 was recorded. One 
hundred trials were performed for each value of K. 

Figure Q] shows the relative error averaged over the 100 trials as a function of K for each ea- As a reference, the 
ideal, noise-free case can be seen for ea = 0. Now fix a particular value of K < 30 and compare the relative error 
for the three nonzero values of ea- It is clear that the error scales roughly linearly with ea- For example, when 
K = 10 the relative errors corresponding to ea = 0.01, 0.05, 0.1 respectively are 9.7 x 10^ 3 , 4.9 x 10~ 2 , 9.7 x 10~ 2 . 
We see here that the relative errors for ea = 0.05 and 0.1 are approximately five and ten times the the relative 
error associated with ea = 0.01. Therefore, this empirical study essentially confirms the conclusion of Theorem |2j 

(K) 

the stability of the BP solution scales linearly with e a . 

Note that improved performance in theory and in simulation can be achieved if BP is used solely to determine 
the support of the solution. Then we can use least squares to better approximate the coefficients on this support. 
This is similar to the the best-case, oracle least squares solution discussed in Section [TV] However, this method of 
recovery was not pursued in the present analysis. 

III. Proofs 

A. Proof of Theorem \J\ 

Recall that we are tasked with determining the maximum 5k given 8k and e^ . Temporarily define Ik and uk 
as the smallest nonnegative numbers such that 

{l-l K )\\x\\l < \\Ax\\ 2 < (l + u K )\\x\\l (19) 

7 We used ea in these simulations since calculating sj^ explicitly is extremely difficult. Notice that ea ~ £a f° r a ^ ^ with high 
probability since both A,E are random Gaussian matrices. 
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Sparsity K 

Fig. 1. Average (100 trials) relative error of BP solution z* with respect to A"-sparse x vs. Sparsity K for different relative perturbations ea 
of A. Here A, E are both 128 x 512 random matrices with i.i.d. Gaussian entries and Eb = 0. 



holds for any iv~-sparse vector x. From the triangle inequality, (f5]) and ([3]) we have 

HAxIl! < (||Ae|| 2 + H-E^lb) 2 (20) 

< (VT+5~k~ + ll^ll^) Nil (21) 

< (l + 8 K )(l + ef } ) 2 \\x\\l (22) 
In comparing the RHS of (fl9l ) and (l22l . it must be that 

(l + u K ) < (l + 5 K )(l + e% ) ) 2 

as demanded by the definition of the uk- Moreover, this inequality is sharp for the following reasons: 

• Equality occurs in (1201 whenever E is a positive, real-valued multiple of A. 

• The inequality in (|2TT ) inherits the sharpness of the upper bound of the RIP for matrix A in ([5]). 

• Equality occurs in ((221 since, in this hypothetical case, we assume that E = (3 A for some < (3 < 1. 
Therefore, the relative perturbation in © no longer represents a worst-case deviation (i.e., the ratio 
\\E\\i K \ _a_. C {K\ 

Since the triangle inequality constitutes a least-upper bound, and since we attain this bound, then 

u K := (l + fo)(l + e { * ) ) 2 -I 

satisfies the definition of uk- 

Now the LHS of ( fl9l ) is obtained in much the same way using the "reverse" triangle inequality with similar 
arguments (in particular, assume -1< (3 < and ejf } := \(3\). Thus 

l K := l-(l-5 K )(l - e^f . 

Next, we need to make the bounds of ( fT9l ) symmetric. Notice that (1 — uk) < (1 — Ik) and (1 + Ik) < (1 + uk )• 
Therefore, given 5^ and we choose 

as the smallest nonnegative constant which makes (fT9l ) symmetric. Finally, it is clear that the actual RIC 8k for A 
obeys 5k < ^max- Hence, (O follows immediately. ■ 
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B. Bounding the perturbed observation 

Before proceeding to the proof of Theorem |2] we need several important facts. First we generalize a lemma 
in [12] about the image of an arbitrary signal. 

Proposition 1 ([12], Lemma 29). Assume that matrix A satisfies the upper bound of the RIP in (0. Then for every 
signal x we have 

£C 1 1 2 "I / = ll a 'l|l 



>K 

Now we can establish sufficient conditions for the lower bound in terms of the head and tail of x and the RIC 
of A. 

Lemma 2. Assume condition A13\) in Theorem [2] Then for general signal x, its image under A can be bounded 
below by the positive quantity 



\Ax\\ 2 > y/l - 5 k (\\xk\\ 2 - «^(||asjH|2 + ^^J 1 



'K 

Proof: Apply Proposition Q] to the tail of x. Then 

||Aa?||2 > ||A£Cic|| 2 - ||Axa-c|| 2 

> y/1 - S K \\x K \\ 2 -Vl + 8k( \\xK°h + " 



K 



\/l - S K (l - (r K + ^=))||aBK-||2 



> 



on account of (fT3l . 



We still need some sense of the size of the total perturbation incurred by E and e. We do not know a priori 
the exact values of E, x, or e. But we can find an upper bound in terms of the relative perturbations in ©. The 
main goal in the following lemma is to remove the total perturbation's dependence on the input x. 

Lemma 3 (Total perturbation bound). Assume condition f liJl ) in Theorem \2\ and set^ 

where e A , £^ A , £b are defined in (0), i^ A , ct A in f liOD . and r^, sk in rtiil ). Then the total perturbation obeys 

||Sa;||2 + ||e||a < e' AjK>b . (23) 
Proof: First divide the multiplicative noise term by ||6||2 and then apply Lemma [2] 

\\Ex\\ 2 (\\E\\ 2 K) \\x K h + \\Eh\\xK'h) ■ 



\Ax\\o ~ 



Xr\\2 ~ K A (H^iHk + \\xi<4l/^K) 

\E\\f ) + \\E\\ 2 r K ^ - 1 



(K) (K) . 

< e A >k a + e A a A r K 
l-K^\r K + s K /VK)' 

Including the contribution from the additive noise term completes the proof. ■ 

8 Note that the results in this paper can easily be expressed in terms of the perturbed observation by replacing ||6||2 < ||6||2(1 — Eb) -1 - 
This can be useful in practice since one normally only has access to b. 
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C. Proof of Theorem [2] 

Step 1. We duplicate the techniques used in Candes' proof of Theorem 1.2 in [1], but with decoding matrix A 
replaced by A. The proof relies heavily on the RIP for A in TheoremQ] Set the BP minimizer in © as z* = x + h. 
Here, h is the perturbation from the true solution x induced by E and e. Instead of Candes' (9), we now determine 
that the image of h under A is bounded by 

\\Ah\\ 2 < \\Az* - b\\ 2 + \\Ax - b\\ 2 (25) 
^ 2 e' AKh . 

The second inequality follows since both terms on the RHS of d25l ) satisfy the BP constraint in (|7]). Notice in the 
second term that x is a feasible solution due to Lemma [3] 

Since the other steps in the proof are essentially the same, we end up with constants a and p in Candes' (14) 
(instead of a and p) where 



2^l + S 2K , ^2 5 2K 

a := — 1 , p : = — . (26) 

1 - S 2 k 1 - $2K 

The final line of the proof concludes that 

26(1 + p) ||a;-a;jr||i 2a , 

ft 2 < — : 7= h z l £ A,K,b- y z/ ) 

1 - p VK 1 - p 

The denominator demands that we impose the condition that < 1 — p, or equivalently 

5 2K < V2 - 1. (28) 

The constants Co and Ci are obtained by first substituting a and p from (f26b into (|27T ). Then, recalling that 
$2K < <52A",max, substitute 5_ftr imax from ([8]) (with K — ► 2K). 

5tep 2. We still need to show that the hypothesis of Theorem |2] implies (|28T ). This is easily verified by substituting 
the assumption of 5 2 k < \/2(l + 2 — 1 into ([8]) (again with K — ► 2ET) and the proof is complete. ■ 

£). Proof of Lemma [7] 

Assume (TTZb in the hypothesis of Theorem [2] It is easy to show that this implies 

||2j?||f° < n - VTTs^. 

Simple algebraic manipulation then confirms that 

</2 - y/1 + 8 2K < y/1 - S 2K < v%*\A). 
Therefore, dTS) holds with k = 2K. Further, for any k < 2K we have a^L(E) < a { I*J (E) and cr^ (A) < 

(k) 

a^J n (A), which proves the first part of the lemma. The second part is an immediate consequence. ■ 



IV. Classical t 2 Perturbation Analysis 

Let the subset T C {1, . . . ,n} have cardinality \T\ = K, and note the following T -restrictions: At £ £, mxK 
denotes the submatrix consisting of the columns of A indexed by the elements of T, and similarly for xt £ C K . 

Suppose the "oracle" case where we already know the support T of xk, i-e., the best K-sparse representation 
of x\2 By assumption, we are only interested in the case where K < m in which At has full rank. Given the 
completely perturbed observation of ©, the least squares problem consists of solving: 

zl^ = argmin \\ At zt — b\\ 2 . 
'Although perhaps slightly confusing, note that xk £ C n , while xt € C K . Restricting xk to its support T yields xt. 
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Since we know the support T, it is trivial to extend to z# G C n by zero-padding on the complement of T. Our 
goal is to see how the perturbations E and e affect z*. Using Golub and Van Loan's model ([13], Thm. 5.3.1) as 
a guide, assume 

]E T \\2 \\eh\ _ O-min(^T) 



max ^-f < / ■ (29) 

l||Ar||2 ||o||2j cr max (A T ) 

Remark 4. This assumption is fairly easy to satisfy. In fact, assumption (fl2l) in the hypothesis of Theorem [2] 
immediately implies that ||2<?t|| 2 /|| At||2 < tTmin (-At) / ow* (-At) for all £^ G [0, \/2 — 1). To see this simply 
set A; = A' in ( fT8l ) of Lemma[TJ and note that ||-E?r|| 2 < ||23||£^ anc ^ a m\l (-A) — ^mint^r)- Further, the reasonable 
condition of £5 < (y^l + e^^) 2 — l) 1 ^ 2 is sufficient to ensure £5 < \/l — 52A"/ \/T + S2K so that assumption (l29l ) 
holds. Note that this assumption has no bearing on CS recovery, nor is it a constraint due to BP. It is simply made 
to enable an analysis of the least squares solution which we use as a best-case comparison below. 

Following the steps in [13] with the appropriate modifications for our situation we obtain 

n # n ^ 11 *t 11 ( ll-^T^rlb || 6 1| 2 
-a^|| 2 < ||A^||a ( n , n + 



I Ax I 



2 



- r <*A,K,b 



1 

where A r = (A^At) _1 A t is the left inverse of At whose spectral norm 



and where 

CA,K,b 



(K) (K) 
A e A 



l-^\r K + s K /VK) 

was obtained using the same steps as in (l24l . Finally, we obtain the total least squares stability expression 



\z# - x|| 2 < ||x - Xk H2 + \\z* — Xk H2 

< \\x - x K \\ 2 + C 2 Ca,k,^ ( 3 °) 



with C 2 = - 5 K - 



A. Comparison of LS with BP 

Now, we can compare the accuracy of the least squares solution in (l30l ) with the accuracy of the BP solution 
found in (fT5T ). However, this comparison is not really appropriate when the original data is compressible since the 
least squares solution z# returns a vector which is strictly A-sparse, while the BP solution z* will never be strictly 
sparse. 

To make the comparison fair, we need to assume that x is strictly A"-sparse. Then, as mentioned previously, the 
constants tk = sk = and the solutions enjoy stability of 

\\z*-x\\ 2 < (hUPeV +£6)116112, 



and 

||z*-x|| 2 < Ci^e*? 5 + e b ) ||6|| 2 . 

Yet, a detailed numerical comparison of C 2 with C\, even at this point, is still is not entirely valid, nor illuminating. 
This is due to the fact that we assumed the oracle setup in the least squares analysis, which is the best that one 
could hope for. In this sense, the least squares solution we examined here can be considered a "best, worst-case" 
scenario. In contrast, the BP solution really should be thought of as a "worst, of the worst-case" scenarios. 

The important thing to glean is that the accuracy of the BP and the least squares solutions are both on the order 
of the noise level 

in the perturbed observation. This is an important finding since, in general, no other recovery algorithm can do 
better than the oracle least squares solution. These results are analogous to the comparison by Candes, Romberg 
and Tao in [2], although they only consider the case of additive noise e. 
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V. Conclusion 

We introduced a framework to analyze general perturbations in CS and found the conditions under which BP 
could stably recover the original data. This completely perturbed model extends previous work by including a 
multiplicative noise term in addition to the usual additive noise term. 

Most of this study assumed no specific knowledge of the perturbations E and e. Instead, the point of view was in 
terms of their worst-case relative perturbations ea, > £b- In real-world applications these quantities must either 
be calculated or estimated. This must be done with care owing to their role in the theorems presented here. 

We derived the RIP for perturbed matrix A, and showed that the penalty on the spectrum of its if -column 
submatrices was a graceful, linear function of the relative perturbation s^jp . Our main contribution, Theorem |2j 
showed that the stability of the BP solution of the complectly perturbed scenario was limited by the total noise in 
the observation. 

Simple numerical examples demonstrated how the multiplicative noise reduced the accuracy of the recovered BP 
solution. Formal numerical simulations were performed on strictly if -sparse signals with no additive noise so as 
to highlight the effect of perturbation E. These experiments appear to confirm the conclusion of Theorem |2j the 
stability of the BP solution scales linearly with 

We also found that the rank of A did not exceed the rank of A under the assumed conditions. This permitted a 
comparison with the oracle least squares solution. 

It should be mentioned that designing matrices and checking for proper RICs is still quite elusive. In fact, the 
only matrices which are known to satisfy the RIP (and which have m ~ K rows) are random Gaussian, Bernoulli, 
and certain partial unitary (e.g., Fourier) matrices (see, e.g., [14], [15], [16]). 



Appendix 
Different cases of perturbation E 

There are essentially two classes of perturbations E which we care most about: random and structured. The 
nature of these perturbation matrices will have a significant effect on the value of ||.E||£% which is used in 
determining e^jp in ([3]). In fact, explicit knowledge of E can significantly improve the worst-case assumptions 
presented throughout this paper. However, if there is no extra knowledge on the nature of E, then we can rely on 
the "worst case" upper bound using the full matrix spectral norm: \\E\\ 2 < ||-E||2- 



A. Random Perturbations 

Random matrices, such as Gaussian, Bernoulli, and certain partial Fourier matrices, are often amenable to analysis 
with the RIP. For instance, suppose that E is simply a scaled version of a random matrix R so that E = PR with 
< P -C 1. Denote 5^ as tne RIC associated with the matrix R. Then for all if -sparse x the RIP for matrix E 
asserts 

32/-, rHMi_||2 ^ ||ei_||2 ^ a 2n , rR\||_||2 



which immediately gives us 



and thus 



F(l-6Z)\\x\\i < \\Ex\\ z 2 < /3*(l + 6%)\\x\ 
\\E\\i K) < P^Jl + S* 



2- 



l*llf <fl V 1 + «. «o 



B. Structured Perturbations 

Structured matrices (e.g., Toeplitz, banded) are ubiquitous in the mathematical sciences and engineering. In the 
CS scenario, suppose for example that E is a partial circulant matrix obtained by selecting m rows uniformly at 
random from annxn circulant matrix. An error in the modeling of a communication channel could be represented 
by such a partial circulant matrix. When encountering a structured perturbation such as this it may be possible to 
exploit its nature to find a bound ||J5||^ < C. 
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A complete circulant matrix has the property that each row is simply a right-shifted version of the row above 
it. Therefore, knowledge of any row gives information about the entries of all of the rows. This is also true for a 
partial circulant matrix. Thus, with this information we may be able to find a reasonable upper bound on ||25||£\ 
The interested reader can find relevant literature at [17]. 
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